CHX Loses A Single Data Center’s Database Access, But Network Between Data Centers and Production FileShare are OK.

Situation

Impacts

Response

Database Access is lost in either CH2 or NY4, but network connectivity between the two data centers is in tact.

 

Production FileShare access is also working as expected between data centers.

 

During Trading Hours:

Trading could continue but database loading in affected data center would no longer be up to date, and clerical support of order research and/or trade corrections in the affected data center would be impossible.

 

If processes need start in the affected data center, they would not be able to.

 

Post Trade Processing using the affected data center would likely be invalid.

 

All clerical and administrative operations would need to rely on the working data center’s database, including all Post Trade Processing.

 

Non-java application startups would need rely on TNS_NAMES.ORA file pointing to the working data center’s database.  They could start in their primary data center’s during this recovery if desired.

 

Java application startups would need to rely on TNS_NAMES.ORA file pointing to the working data center’s database as well as database specific configurations located on local servers in each data center.  They may need to start in alternate data center depending on which database is affected.

1)      Notify management.

2)      If decision is made to rely on the working data center’s database exclusively for the rest of the day, copy the working data center’s TNS_NAMES.ORA file over the affected data center’s TNS_NAMES.ORA file in chxappcfg folders. 

3)      Stop any affected database loaders trying to write to affected database.  These should remain down.

4)      Confirm all database loading to working database is up to date.

5)      Start/Restart any required java application in the same data center as the working data center.  These applications include: BPLX, CSI, MNT, MROL and OMS.

6)      Start/Restart any required non-Java application as necessary in its primary data center. 

7)      Consider suspending trading only if MEs need to restart.

-          Use NTM Control ME commands before stopping/restarting MEs.

-          If MEs crash, rely on restart to halt trading automatically.

8)      Note that OSF Simulators have hard-coded database references in their xml configurations.  If these will be used for testing, they may need changes.

9)      Production Support should also be cognizant of database’s being used in ER queries.

10)   Change post-trading processing to work from single healthy data center.

 

Scope of Impact Response Considerations:

1)      This scenario documents a very specific problem with a well defined scope of impact and response.